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DESCRIPTION 

METHOD AND APPARATUS FOR EFFICIENT PRESENTATION 
OF HIGH-QUALITY THREE-DIMENSIONAL AUDIO 

Technical Field 

The invention relates in general to the presentation of audio signals conveying an 
impression of a three-dimensional sound field and more particularly to an efficient method 
and apparatus for high-quality presentations. 

Background 

There is a growing interest to improve methods and systems for audio displays 
which can present audio signals conveying accurate impressions of three-dimensional sound 
fields. Such audio displays utilize techniques which model the transfer of acoustic energy 
in a soundfield from one point to another. A frequency-domain form of such models is 
referred to as an acoustic transfer function (ATF) and may be expressed as a function 
H(rf,0,0, w ) of frequency a and relative position (d,9,4>) between two points, where (d,d,<f>) 
represents the relative position of the two points in polar coordinates. Other coordinate 
systems may be used. 

Throughout the following discussion, more particular mention is made of various 
frequency-domain transfer functions; however, it should be understood that corresponding 
time-domain impulse response representations exist which may be expressed as a function 
of time / and relative position between points, or W,B,4>,t). The principles and concepts 
discussed here are applicable to either domain. 

An ATF may model the acoustical properties of a test subject. In particular, an 
ATF which models the acoustical properties of a human torso, head, ear pinna and ear 
canal is referred to as a head-related transfer function (HRTF). A HRTF describes, with 
respect to a given individual, the acoustic levels and phases which occur near the ear drum 
in response to a given soundfield. The HRTF is typically a function of both frequency and 
relative orientation between the head and the source of the soundfield. A HRTF in the 
form of a free-field transfer function (FFTF) expresses changes in level and phase relative 
to the levels and phase which would exist if the test subject was not in the soundfield; 
therefore, a HRTF in the form of a FFTF may be generalized as a transfer function of the 
form H(M,u). The effects of distance can usually be simulated by amplitude attenuation 
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proportional to the distance. In addition, high-frequency losses can be synthesized by 
various functions of distance. Throughout this discussion, the term HRTF and the like 
should be understood to refer to FFTF forms unless a contrary meaning is made clear by 
explanation or by context. 

Many applications comprise acoustic displays utilizing one or more HRTF in 
attempting to "spatialize" or create a realistic three-dimensional aural impression. Acoustic 
displays can spatialize a sound by modelling the attenuation and delay of acoustic signals 
received at each ear as a function of frequency o> and apparent direction relative to head 
orientation (6 t <f>). An impression that an acoustic signal originates from a particular 
relative direction (9,<t>) can be created in a binaural display by applying an appropriate 
HRTF to the acoustic signal, generating one signal for presentation to the left ear and a 
second signal for presentation to the right ear, each signal changed in a manner that results 
in the respective signal that would have been received at each ear had the signal actually 
originated from the desired relative direction. 

Empirical evidence has shown that the human auditory system utilizes various cues 
to identify or "localize" the relative position of a sound source. The relationship between 
these cues and relative position are referred to here as listener "localization characteristics" 
and may be used to define HRTF. The differences in the amplitude and the time of arrival 
of soundwaves at the left and right ears, referred to as the interaural intensity difference 
(IID) and the interaural time difference (TTD), respectively, provide important cues for 
localizing the azimuth or horizontal direction of a source. Spectral shaping and attenuation 
of the soundwave provides important cues used to localize elevation or vertical direction of 
a source, and to identify whether a source is in front of or in back of a listener. 

Although the type of cues used by nearly all listeners is similar, localization 
characteristics differ. The precise way in which a soundwave is altered varies considerably 
from one individual to another because of considerable variation in the size and shape of 
human torsos, heads and ear pinnae. Under ideal situations, the HRTF incorporated into 
an acoustic display is the personal HRTF of the actual listener because a universal HRTF 
for all individuals does not exist. Additional information regarding the suitability of shared 
HRTF may be obtained from Wightman, et al., "Multidimensional Scaling Analysis of 
Head-Related Transfer Functions," IEEE Worksh op on Applications of Si P froc. to Audio 
and Acoust. . October 1993. 

In many practical systems, however, several HRTF known to work well with a 
variety of individuals are compiled into a library to achieve a degree of sharing. The most 
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appropriate HRTF is selected for each listener. Additional information may be obtained 
from Werizel, et al., "Localization Using Nonindividualized Head-Related Transfer 
Functions," J. Acoust. Soc. Am., vol. 94, July 1993, pp. 1 1 1-123. 

The realism of an acoustic display can be enhanced by including ambient effects. 
One important ambient effect is caused by reflections. In most environments, a soundfield 
comprises soundwaves arriving at a particular point, say at an ear, along a direct path from 
the sound source and along paths reflecting off one or more surfaces of walls, floor, ceiling 
and other objects. A soundwave arriving after reflecting off one surface is referred to as a 
first-order reflection. The order of the reflection increases by one for each additional 
reflective surface along the path. The direction of arrival for a reflection is generally not 
the same as that of the direct-path soundwave and, because the propagation path of a 
reflected soundwave is longer than a direct-path soundwave, reflections arrive later. In 
addition, the amplitude and spectral content of a reflection will generally differ because of 
energy absorbing qualities of the reflective surfaces. The combination of high-order 
reflections produces the diffuse soundfields associated with reverberation. 

A HRTF may be constructed to model ambient affects; however, more flexible 
displays utilize HRTF which model only the direct-path response and include ambient 
effects synthetically. The effects of a reflection, for example, may be synthesized by 
applying a direct-path HRTF of appropriate direction to a delayed and filtered version of 
the direct-path signal. The appropriate direction is the direction of arrival at the ear may 
be established by tracing the propagation path of the reflected soundwave. The delay 
accounts for the reflective path being longer than the direct path. The filtering alters the 
amplitude and spectrum of the delayed soundwave to account for acoustical properties of 
reflective surfaces, air absorption, nonuniform source radiation patterns and other 
propagation effects. Thus, a HRTF is applied to synthesize each reflection included in the 
acoustic display. 

In many acoustic displays, HRTF are implemented as digital filters. Considerable 
computational resources are required to implement accurate HRTF because they are very 
complex functions of direction and frequency. The implementation cost of a high-quality 
display with accurate HRTF is roughly proportional to the complexity and number of filters 
used because the amount of computation required to perform the filters is significant as 
compared to the amount of computation required to perform all other functions. An 
efficient implementation of HRTF filters is needed to reduce implementation costs of high- 
quality acoustic displays. Efficiency is very important for practical displays of complex 
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soundfields which include many reflections. The complexity is essentially doubled in 
binaural displays and increases further for multiple sources and/or multiple listeners. 

The term "filter" and the like as used here refer to devices which perform an 
operation equivalent to convolving a time-domain signal with an impulse response. 
5 Similarly, the term "filtering" and the like as used here refer to processes which apply such 
a "filter" to a time-domain signal. 

One technique used to increase the efficiency of spatializing late-arriving reflections 
is disclosed in U.S. patent 4,731,848. According to this technique, direct-path soundwaves 
and first-order reflections are processed in a manner similar to that discussed above. The 

10 diffuse soundwaves produced by higher-order reflections are synthesized by a reverberation 
network prior to spectral shaping and delays provided by "directionalizers." 

Another technique used to increase the efficiency of spatializing early reflections is 
disclosed in U.S. patent 4,817,149. According to this technique, three separate processes 
are used to spatialize the direct-path soundwave, early reflections and late reflections. The 

15 direct-path soundwave is spatialized by providing front/back and elevation cues through 
spectral shaping, and is spatialized in azimuth by including either ITD or IID. The early 
reflections are spatialized by propagation delays and azimuth cues, either ITD or IID, and 
are spectrally shaped as a group to provide "focus" or a sense of spaciousness. The late 
reflections are spatialized in a manner similar to that done for early reflections except that 

20 reverberation and randomized azimuth cues are used to synthesize a more diffuse 
sound field. 

These techniques improve the efficiency of spatializing reflections but they do not 
improve the efficiency of spatializing a direct-path soundwave nor do they provide a way to 
more efficiently spatialize binaural displays, to spatialize multiple sources or present a 

25 spatialized display to multiple listeners. 

A technique used to more efficiently spatialize an audio signal is implemented in the 
UltraSound™ multimedia sound card by Advanced Gravis Computer Technology Ltd., 
Burnaby, British Columbia, Canada. According to this technique, an initial process 
records several prefiltered versions of an audio signal. The prefiltered signals are obtained 

30 by applying HRTF representing several positions, say four horizontal positions spaced apart 
by 90 degrees and one or two positions of specified elevation. Spatialization is 

9' 

accomplished by mixing the prefiltered signals. In effect, spatialization is accomplished by 
panning between fixed sound sources. The spatialization process is fairly efficient and has 
an intuitive appeal; however, it does not provide very good spatialization unless a fairly 
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large number of prefiltered signals are used. This is because each of the prefiltered signals 
include ITD, and a soundwave appearing to originate from an intermediate point cannot be 
reasonably approximated by a mix of prefiltered signals unless the signals represent 
directions fairly close to one another. Limited storage capacity usually restrict the number 
5 of prefiltered signals which can be stored. In addition, the technique imposes a rather 

serious disadvantage in that neither the HRTF nor the audio source can be changed without 
rerecording the prefiltered signals. This technique is described briefly in Begault, "3-D 
Sound for Virtual Reality and Multimedia," Academic Press, Inc., 1994, p. 210. 

As explained above, accurate HRTF are expensive to implement because they are 

10 complex functions of direction and frequency. Research discussed in Martens, "Principal 
Components Analysis and Resyn thesis of Spectral Cues to Perceived Direction," ICMC 
Proceedings . 1987, pp. 274-281, and in Kistler, et ah, "A Model of Head-Related Transfer 
Functions Based on Principal Components Analysis and Minimum-Phase Reconstruction," 
J. Acoust. Soc. Am.. March 1992, pp. 1637-1647, used principal component analysis to 

15 develop the concept that HRTF can be approximated fairly well by a small number of 

fixed-frequency-response basis functions. In particular, Kistler, et al. showed that as few 
as five log-magnitude basis functions could reasonably represent a direction-dependent 
portion of HRTF responses, referred to as directional transfer functions (DTF), for each 
ear of ten different test subjects. Direction-independent aspects such as ear canal resonance 

20 were excluded from the principal component analysis. Phase responses of the HRTF were 
approximated by ITD which were assumed to be frequency independent. 

Kistler, et al. showed that binaural HRTF for a particular individual and specified 
direction can be approximated by scaling the log-magnitude basis functions with a set of 
weights, combining the scaled functions to obtain composite log-magnitude response 

25 functions representing DTF for each ear, deriving two minimum phase filters from the log- 
magnitude response functions, adding excluded direction-independent characteristics such as 
ear canal resonance to derive HRTF representations from the DTF representations, and 
calculating a delay for ITD to simulate phase response. Unfortunately, these basis 
functions do not provide for any improvement in implementation efficiency of HRTF. In 

30 addition, Kistler, et al. concluded that the principal component weights for the five basis 
functions were very complex functions of direction and could not be easily modeled. 

There remains a need for a method to efficiently implement accurate HRTF, 
particularly for acoustic displays which spatialize multiple sources and/or generate unique 
displays for multiple listeners. 
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Disclosure of Invention 
It is an object of the present invention to provide for a method and apparatus to 
efficiently implement accurate HRTF for high-quality acoustic displays. 

It is another object of the present invention to provide for an efficient method and 
5 apparatus to spatialize multiple sources. 

It is yet another object of the present invention to provide for an efficient method 
and apparatus to spatialize a source for binaural presentation to one or more listeners, for 
monaural presentation to two or more listeners, or for a combination of binaural and 
monaural presentations. 

10 It is a further object of the present invention to provide for an efficient method and 

apparatus to spatialize multiple sources to multiple listeners, allowing for trade off between 
accuracy of spatialization and numbers of sources or listeners. 

Other objects and advantages of the. present invention may be appreciated by 
referring to the following discussion and to the accompanying drawings. 

15 In accordance with the teachings of the present invention, a method or apparatus for 

providing an acoustic display comprises steps or means, respectively, for receiving an 
audio signal representing an acoustic source, receiving a location signal representing 
apparent location of the source, applying two or more filters to the audio signal, and 
generating a plurality of output signals by amplifying the output of each filter using 

20 amplifier gains adapted in response to the location signal and combining the amplified 
signals. The output signals may provide binaural presentation to one or more listeners, 
monaural presentation to two or more listeners or a combination of binaural and monaural 
presentations. 

In accordance with the teachings of the present invention, a method or apparatus for 
25 providing an acoustic display comprises steps or means, respectively, for receiving audio 
signals representing two or more acoustic sources, receiving location signals representing 
apparent locations of the sources, amplifying each audio signal using amplifier gains 
adapted in response to the location signals, generating two or more intermediate signals by 
combining the amplified audio signals, applying two or more filters to the two or more 
30 intermediate signals, and generating an output signal by combining the output of each filter. 

In accordance with the teachings of the present invention, the method or apparatus 
just described may generate two or more output signals for binaural presentation to one or 
more listeners, monaural presentation to two or more listeners or a combination of binaural 
and monaural presentations by amplifying the output of each filter using amplifier gains 
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adapted in response to listener position and/or orientation and generating the two or more 
output signals by combining the amplified filtered signals. 

In accordance with the teachings of the present invention, a method or apparatus for 
providing an acoustic display comprises steps or means, respectively, for receiving an 
5 audio signal representing an acoustic source, receiving a location signal representing 

apparent location of the source, rendering a direct-path response by applying a first filter 
with a frequency response adapted in response to the location signal, spatializing reflections 
by applying one or more second filters with unvarying frequency response to the audio 
signal and amplifying the output of each second filter using amplifier gain adapted in 
10 response to the location signal, and generating an output signal by combining signals passed 
by the first filter and the second filters. Alternatively, the steps of applying a second filter 
and amplifying with an adaptive gain may be interchanged. 

Each of the methods and apparatuses in accordance with the present invention may 
be modified to also adapt the amplifier gains in response to listener position or personal 
15 localization characteristics. In preferred embodiments, one or more output signals are 

delayed in response to listener position, orientation and/or localization characteristics. The 
methods and apparatus may also adapt the amplifier gains and/or introduce delays in 
response to a signal representing ambient characteristics. High-quality displays may also 
filter and scale signals according to source aspect to account for nonuniform source 
20 radiation patterns and/or according to atmospheric and reflective-surface characteristics to 
account for transmission losses. Further, in some embodiments, amplifier gains may be 
adapted to provide for varying numbers of audio signals and/or output signals. 

Throughout this discussion, references to binaural presentations should be 
understood to also refer to presentations utilizing more than two output signals unless the 
25 context of the discussion makes it clear that only a two-channel presentation is intended. 

The present invention may be implemented in many different embodiments and 
incorporated into a wide variety of devices. It is contemplated that the present invention 
will be most frequently practiced using digital signal processing techniques implemented in 
software and/or so called firmware; however, the principles and teachings may be applied 
30 using other techniques and implementations. The various features of the present invention 
and its preferred embodiments may be better understood J?y referring to the following 
discussion and to the accompanying drawings in which like reference numbers refer to like 
features. The contents of the discussion and the drawings are provided as examples only 
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and should not be understood to represent limitations upon the scope of the present 
invention. 

Prtef Pescriptton of Prayings 
Figure 1 is a functional block diagram illustrating one implementation of HRTF 
5 according to the present invention for use in an acoustic display for presentation of multiple 
sources in one output signal. 

Figure 2 is a functional block diagram illustrating one implementation of HRTF 
according to the present invention for use in an acoustic display for presentation of a single 
source in multiple output signals. 
10 Figure 3 is a functional block diagram illustrating one implementation of HRTF 

according to the present invention for use in an acoustic display for presentation of multiple 
sources in multiple output signals. 

Figure 4 is a functional block diagram illustrating one implementation of a HRTF 
according to the present invention comprising a hybrid structure of filters with varying and 
15 unvarying frequency response characteristics. 

Figure 5a-5b are functional block diagrams of filter-amplifier networks. 
Figure 6 is a function block diagram illustrating one implementation of a HRTF 
according to the present invention comprising a hybrid structure of filters and an amplifier 
network in which a single set of filters with unvarying frequency response characteristics 
20 spatializes reflective effects for a single audio source and multiple output signals. 

Figures 7a and 7b are functional block diagrams illustrating implementations of 
HRTF according to the present invention in which filters having unvarying frequency 
response characteristics were derived from impulse responses representing ATF such as 
directional transfer functions. 

25 Modes for Carrvin2 Out the Invention 

Multiple Source Signals 
A functional block diagram shown in Figure 1 illustrates one structure of a device 
according to the teachings of the present invention which implements HRTF for multiple 
audio sources. An audio signal representing a first audio source is received from path 101, 
30 amplified by a first group of amplifiers 111-114 and passed to combiners 121-124. 
Another audio signal representing a second audio source is received from path 103, 
amplified by a second group of amplifiers 115-118 and passed to combiners 121-124. 
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Combiner 121 combines amplified signals received from amplifiers 111 and 115 and passes 
the resulting intermediate signal to filter 131. Combiners 122-124 combine amplified 
signals received from other amplifiers as shown and pass the resulting intermediate signals 
to respective filters 132-134. Filters 131-134 each apply a filter to a respective 
intermediate signal and pass the resulting filtered signals to combiner 151. Combiner 151 
combines the filtered signals and passes the resulting output signal along path 161. 

Location signals received from paths 102 and 104 represent the desired apparent 
locations of the sources of the audio signals received from paths 101 and 103, respectively. 
Respective gains of amplifiers 111-114 in the first group of amplifiers are adapted in 
response to the location signal received from path 102 and respective gains of amplifiers 
115-118 in the second group of amplifiers are adapted in response to the location signal 
received from path 104. 

The structure shown in Figure 1 implements HRTF for two audio sources and can 
be extended to implement HRTF for additional sources by adding a group of amplifiers for 
each additional source and coupling the output of each amplifier in a group to a respective 
combiner which is coupled to the input of a respective filter. The illustrated structure 
comprises four filters but as few as two filters may be used. Very accurate HRTF can 
generally be implemented using no more than twelve to sixteen filters. 

Multiple Output Signals 
A functional block diagram shown in Figure 2 illustrates one structure of a device 
according to the teachings of the present invention which implements HRTF for multiple 
output signals. Each one of filters 131-134 apply a filter to an audio signal received from 
path 101 representing an audio source. Filter 131 passes the filtered signal to amplifiers 
141 and 145 which amplify the filtered signal. Filters 132-134 pass filtered signals to 
other amplifiers as shown and each amplifier amplifies a respective filtered signal. 
Combiner 151 combines amplified signals received from amplifiers 141-144 and passes the 
resulting first output signal along path 161. Combiner 152 combines amplified signals 
received from amplifiers 145-148 and passes the resulting second output signal along path 
163. 

A location signal received from path 102 represents the desired apparent location of 
the source of the audio signal received from path 101. Position signals received from paths 
162 and 164 represent position and/or orientation of one or more listeners. For example, 
the two position signals may represent position information for each ear of one listener or 
position information for two listeners. In the embodiment illustrated, respective gains of 
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amplifiers 141-144 in a first group of amplifiers are adapted in response to the location 
signal received from path 102 and the position signal received from path 162, and 
respective gains of amplifiers 145-148 in a second group of amplifiers are adapted in 
response to the location signal received from path 102 and the position signal received from 
5 path 164. In alternative embodiments, respective gains of amplifiers in a group of 

amplifiers may be adapted in response to only the location signal received from path 102 or 
only a respective position signal. 

The multiple output signals may be used to provide binaural presentation to one or 
more listeners, monaural presentation to two or more listeners or a combination of binaural 

10 and monaural presentations. As explained above, the term "binaural" refers to 
presentations comprising two or more output signals. 

The structure shown in Figure 2 implements HRTF for two output signals and can 
be extended to implement HRTF for additional output signals by adding a group of 
amplifiers for each additional output and coupling the input of each amplifier in a group to 

15 a respective filter. The illustrated structure comprises four filters but two or more filters 
may be used as desired. 

Multiple Source and Output Signals 
A functional block diagram shown in Figure 3 illustrates one structure of a device 
according to the teachings of the present invention which implements HRTF for multiple 
20 audio sources and multiple output signals. The structure and operation are substantially a 
combination of the structures and operations shown in Figures 1 and 2 and described above 
except that, preferably, the gains of amplifiers 141-148 are not adapted in response to 
location signals received from paths 102 and 104. 

In an alternative embodiment discussed below, the respective gains of amplifiers 
25 111-118 and/or amplifiers 141-148 may be adapted to effectively dedicate certain filters to 
particular audio sources and/or output signals to trade off accuracy of spatialization against 
numbers of sources and/or listeners. 

Hybrid Structure 

A functional block diagram shown in Figure 4 illustrates a hybrid filtering structure 
30 incorporated into a device according to the teachings of the present invention which 
implements a HRTF for one audio source and one output signal. Filter 3 and filter 
networks 21 and 22 each apply a filter to an audio signal received from path 101 
representing an audio source. Filter 3 applies a filter having frequency response 
characteristics adapted by response control 10 in response to a location signal received 
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from path 102. Filter network 21 applies a filter having unvarying frequency response 
characteristics and utilizes an amplifier having a gain adapted by gain control 1 1 in 
response to the location signal received from path 102. Filter network 22 applies a filter 
having unvarying frequency response characteristics and utilizes an amplifier having a gain 
5 adapted by gain control 12 in response to the location signal received from path 102. The 
signals resulting from filter 3 and filter networks 21 and 22 are combined by combiner 151 
and the resulting output signal is passed along path 161. 

The location signal received from path 102 represents the desired apparent location 
of the source of the audio signal received from path 101. In an alternative embodiment, 

10 response control 10 and gain controls 11 and 12 may respond to other signals such as 
position signals representing position and/or orientation of a listener, and/or signals 
representing reflection effects. 

As shown in Figures 5a and 5b, the filter networks may be implemented by an 
amplifier 111 with gain adapted in response to gain control 1 1 and a filter 131. In one 

15 embodiment, the input of the filter is coupled to the output of the amplifier. In another 
embodiment, the input of the amplifier is coupled to the output of the filter. 

In one application, filter 3 implements a direct-path response function for one audio 
source to one ear of one listener and one or more filter networks synthesize the effects of 
reflections for one audio source to both ears of all listeners. Propagation effects on the 

20 reflected soundwaves, including delays, reflective- and transmissive- materials filtering, air 
absorption, soundfield spreading losses and source-aspect filtering, may he synthesized by 
delaying and filtering signals at various points in the structure but preferably at either the 
input or output of the filter networks. In many applications, reflections may be rendered 
with sufficient accuracy using as few as two or three filter networks. 

25 In another application, reflections of one audio signal are spatialized for multiple 

output signals using only one set of filters having unvarying frequency response 
characteristics. Figure 6 illustrates a hybrid structure which synthesizes two reflected 
soundwaves for each of two output signals. The two output signals may be intended for 
binaural presentation to one listener or may be intended for monaural presentation to two 

30 listeners. 

Referring to Figure 6, filter 3 generates a direct-path response along path 160 by 
applying a filter to an audio signal received from path 101. Filter 131 applies a filter to 
the audio signal and passes the filtered signal to amplifiers 141, 143, 145 and 147 which 
amplify the filtered signal. Filter 132 applies a filter to the audio signal and passes the 
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filtered signal to amplifiers 142, 144, 146 and 148 which amplify the filtered signal. 
Combiner 151 combines signals received from amplifiers 141 and 142 and passes the 
combined signal to delay element 171. Combiners 152-154 combine the signals received 
from the remaining amplifiers and pass the combined signals to respective delay elements 
5 172-174. Combiner 155 combines delayed signals received from delay elements 171 and 
172 and passes the resulting signal along path 161. Combiner 156 combines delayed 
signals received from delay elements 173 and 174 and passes the resulting signal along 
path 163. If a binaural presentation is desired, the signals passed along paths 160 and 161 
are combined for presentation to one ear and the output from a second filter 3, not shown, 

10 is combined with the signal passed along path 163 for presentation to the second ear. 

A location signal received from path 102 represents the desired apparent position of 
the source of the audio signal received from path 101. An ambient signal also received 
from path 102 represents the reflection geometry of the ambient environment. Position 
signals received from paths 162 and 164 represent position and/or orientation information 

15 for each ear of one listener or position information for two listeners. In the embodiment 
illustrated, filter 3 adapts frequency response characteristics in response to the location 
signal and, preferably, in response to the position signal for one listener. Respective gains 
of amplifiers 141-144 are adapted in response to the location signal and the ambient signal 
received from path 102 and the position signal received from path 162, and respective 

20 gains of amplifiers 145-148 are adapted in response to the location signal and the ambient 
signal received from path 102 and the position signal received from path 164. The gains of 
these amplifiers are adapted according to the direction of arrival for a reflected soundwave 
to be synthesized. 

Delay elements 171 and 172 impose signal delays of a duration adapted in response 
25 to the location signal and the ambient signal received from path 102 and the position signal 
received from path 162. Delay elements 173 and 174 impose signal delays of a duration 
adapted in response to the location signal and the ambient signal received from path 102 
and the position signal received from path 164. The durations of the respective delays are 
adapted according to the length of the propagation path of respective reflected soundwaves. 
30 In addition, filtering and/or amplification may be provided with the delays to synthesize 
various propagation and ambient effects such as those described above. 

Additional amplifiers, combiners and delay elements may be incorporated into the 
illustrated embodiment to increase the number of synthesized reflected soundwaves and/or 
the number of output signals. These additional components do not significantly increase 
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the complexity of the HRTF because the number of filters used to synthesize reflections is 
unchanged. 

Derivation of Filters 

Efficiency of implementation may be achieved in each of the structures discussed 
above by utilizing an appropriate set of N filters having unvarying frequency response or, 
equivalently, unvarying impulse response characteristics. For discrete-time systems, these 
filters may be derived from an optimization process which derives an impulse response 
q/f,) for each filter in a set of //unit-energy filters that, when weighted and summed, 
form a composite impulse response h(8,<f> t t p ) providing the best approximation to each 
impulse response h(0,<f>,t p ) in a set of M impulse responses. Preferably, the set H of M 
impulse responses represents an individual listener, real or imaginary, having localization 
characteristics which represent a large segment of the population of intended listeners, the 
set H of U impulse responses may be expressed as 

H = {h(9„/,)} forO</><P (1) 

where 9, denotes a particular relative direction (0, 

t p denotes discrete sample times, and 

P is the length of the impulse responses in samples. 
Preferably, the angular spacing between adjacent directions is no more than 30 to 45 
degrees in azimuth and 20 to 30 degrees in elevation. The composite impulse response 
fi(9 Jf r) of the weighted and summed set of N filter impulse responses may be expressed as 

- f>/9,)-q,(/,) (2) 

where w/9,) is the corresponding weight or coefficient for the impulse response of filter / 
at direction 9,. 

The derivation process seeks to optimize the approximation by minimizing the 
square of the approximation error over all impulse responses in the set H, and may be 
expressed as 

^«H-ft|| F = ±£-!__ — for6<N< M (3) 

where ||jc|| f denotes the Forbenious norm of jc, and 

ft is a set of M composite impulse responses fi(8 ( ,r p ). 
According to expression 2, the set ft may be expressed as 
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= W • Q 

where W denotes an N x M matrix of coefficients w,(0,), and 



(4) 



Q denotes a set of N impulse responses q/r, ). 
This decomposition allows the optimization of expression 3 to be expressed as 



min(«H-W-Q| f ) 



(5) 



By recognizing that the Forbenious norm is invariant under orthonormal 
transformation, it may be seen that the set of N impulse responses Q are the left singular 
vectors associated with the N largest singular values of H and that the coefficient matrix W 
is the product of the corresponding right singular vectors and diagonal matrix of singular 
values. The Forbenious norm of the approximation error is the sum of the M—N smallest 
singular values. 

The optimization process described above is known as "singular value 
decomposition" and derives a set of impulse responses q//,) which are orthogonal. 
Additional information about singular value decomposition and the Forbenious norm may 
be obtained from Golub, et aL, "Matrix Computations," Johns Hopkins University Press, 
2nd ed M 1989, pp. 55-60, 70-78. Other decomposition processes and norms as such as 
those disclosed by Golub, et al. may be used to derive the W and Q matrices. 

The choice of impulse response in the set H affects the resultant filters Q. For 
example, filters for use in a display providing only azimuthal localization may be derived 
from a set of impulse responses for directions which lie only in the horizontal plane. 
Similarly, filters for use in a display in which azimuthal localization is much more 
important than elevation localization may be derived from a set H which comprises many 
more impulse responses for directions in the horizontal plane than for directions above or 
below the horizontal plane. The set H may comprise impulse responses for a single ear or 
for both ears of one individual or of more than one individual. It should be understood, 
however, that as the number of impulse responses in the set H increases, the number of 
impulse responses in the set Q must also increase to achieve a given level of approximation 
error. 

As another example, a set of filters which optimize only the magnitude response of 
HRTF may be derived from a set H which comprises linear- or minimum-phase impulse 
responses, or impulse responses which are time aligned in some manner. The phase 
response may be synthesized separately by ITD, discussed below. 
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The optimization process described above assumes that the impulse responses q/^ ) 
in set H correspond to HRTF comprising both directionally-dependent aspects and 
directionally-independent aspects such as ear canal resonance. The process may also derive 
filters from impulse responses corresponding to other ATF such as DTF, for example, 
5 from which a common characteristic has been removed. The derived filters, taken 
together, approximate the ATF and the common characteristic excluded from the 
optimization may be provided by a separate filter. This is illustrated in Figures 7a and 7b. 

Referring to Figure 7a, amplifier network 20 amplifies and combines the audio 
signals received from paths 101 and 103 to generate a set of intermediate signals which are 
10 passed to the set of N filters 131-134 derived by the optimization process, each of filters 
131-134 applies a filter to a respective intermediate signal, combiner 151 combines the 
filtered signals to generate a composite signal, and filter 130 generates an output signal 
along path 161 by applying a filter having the common characteristics excluded from filters 
131-134 to the composite signal. This structure corresponds to the structure illustrated in 
15 Figure 1 and is preferred in applications where the number of audio signals exceeds the 
number of output signals. 

Referring to Figure 7b, filter 130 generates an intermediate signal by applying a 
filter having the common characteristics excluded from Filters 131-134 to the audio signal 
received from path 101, the set of AT filters 131-134 derived by the optimization process 
20 each filter the intermediate signal received from filter 130, and amplifier network 40 

amplifies and combines the filtered signals to generate output signals along paths 161 and 
163. This structure corresponds to the structure illustrated in Figure 2 and is preferred in 
applications where the number of output signals exceeds the number of audio signals. 
It may be of interest to note that if the common characteristic excluded from the 
25 optimization process corresponds to the directionally-independent aspects of HRTF, then 
the first derived impulse response f\(Q it t p ) is substantially equal to the Dirac delta function. 

As mentioned above, the number of filters required to achieve a given 
approximation error depends on the impulse responses constituting the set H. Preferably, a 
set of linear- or minimum-phase impulse responses are used because the approximation 
30 error is expected to decrease more rapidly for increasing // than would occur for impulse 
responses including ITD which are not aligned in time with one another. 

An acoustic display incorporating a set of filters and weights derived according to 
the process described above can spatialize an audio signal to any given direction Q t by 
calculating a set of weights Wj(Q t ) appropriate for the given direction and using the weights 
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to set amplifier gains. The weights for a given direction can be calculated by linearly 
interpolating between weights w,(Q,) corresponding to the directions 0, closest to the given 
direction. 

In concept, each filter convolves a time-domain signal with a respective impulse 
response. Filtering may be accomplished in a variety of ways including recursive or so 
called infinite impulse response (IIR) filters, nonrecursive or so called finite impulse 
response (FIR) filters, lattice filters, or block transforms. No particular filtering technique 
is critical to the practice of the present invention; however, it is important to note that the 
composite filter response actually achieved from a filter implemented according to 
expression 2 may not match the desired composite impulse response derived by 
optimization. In preferred embodiments, the filters are checked to ensure that the 
difference between the desired impulse response and the actual impulse response is small. 
This check must take into account both magnitude and phase; therefore, the technique used 
to implement the filters must either preserve phase or otherwise account for changes in 
phase so that correct results are obtained from the weighted sum of the impulse responses. 

Dynamic Reconfiguration 

The function performed by the structure illustrated in Figure 3 may be expressed in 
algebraic form as 

P('„) = W^G) • Q • W h (9) • S(f,) (6) 
where P(t p ) denotes a column vector of output signals of length 

S(f,) denotes a column vector of input signals of length L*. 

W fa (6) denotes an M x matrix of input coefficients. 

W om(9) denotes an x M matrix of output coefficients, and 

Q denotes an M x M diagonal matrix of filters. 
This structure may implement HRTF for each input signal and output signal provided the 
matrix product W^G) ■ Q • W iD (G) can be made to approximate the source-listener HRTF 
matrix. This approximation can be made if the matrix product is full rank. 

If only one input signal is present, equals one, the rank of matrix W k equals one, 
and the matrix product may be rewritten as shown in the following expression: 

Xcui(G) • Q (7a) 
where X^G) denotes an x M matrix. This condition results in a structure which is 
equivalent to the structure illustrated in Figure 2. If only one output signal is needed, 
equals one, the rank of equals one, and the matrix product may be rewritten as shown 
in the following expression: 
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Q-x^O) (7b) 

where X ta (0) denotes an M x matrix. This condition results in a structure which is 
equivalent to the structure illustrated in Figure 1. If the minimum rank of matrices W b 
and is K t however, the matrix product in expression 6 can be rewritten in a form 
5 shown in expressions 7a or 7b if K sets of filters Q are available; however, if only J < K 
sets of filters Q are available, then a rank J approximation of the rank K system may be 
used but spatialization performance will be degraded. 

Referring to the structure illustrated in Figure 3, for example, the filters may be 
configured into one set of four filters, two sets of two filters, four sets of one filter, or 

10 three sets each comprising either one or two filters. When configured as one set of four 
filters, the structure may implement HRTF for one source signal and any number of output 
signals, as shown in Figure 2, or it may implement HRTF for any number of input signals 
and one output signal, as shown in Figure 1. When configured as two sets of filters, the 
structure may implement HRTF for two source signals and any number of output signals or 

15 for any number of input signals and two output signals. Reconfiguration may be 

accomplished by setting the gains in various amplifiers to zero, thereby isolating the filters 
from certain input signals or from certain output signals. 

Dynamic reconfiguration is useful in applications which must support a widely 
varying number of sources and listeners because a device of given complexity may easily 

20 trade off the accuracy of spatialization against the smaller of the number of input signals 
and output signals. Accuracy of spatialization can sometimes be sacrificed without 
noticeable effect when listener ability to localize is degraded. Such degradation occurs, for 
example, when listeners are distracted, overwhelmed by very large numbers of sound 
sources, or when a sound is difficult to localize. Examples of sounds which are difficult to 

25 localize are those generated by narrow-band or quiet short-duration signals, sounds which 
occur in a reverberant environment, or sounds which originate in particular regions such as 
directly overhead or at great distances from the listener. 

Variations and Extensions 
In preferred embodiments, the magnitude of HRTF response is implemented by 

30 linear- or minimum-phase filters and the phase of HRTF response is implemented by 
delays. Relative delays between left- and right-ear signals produce ITD which is an 
important azimuth cue. Delays may also be used to synthesize the arrival of reflections or 
to simulate the effects of distance. Filtering and scaling may be used to synthesize 
propagation and ambient effects such as air absorption, soundfield spreading losses, 
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nonuniform source radiation patterns, and transmissive- and reflective-materials 
characteristics. This additional processing may be introduced in a wide variety of places. 
Although no particular implementation is critical to the practice of the present invention, 
some implementations are preferred. Preferably, delays, filtering and scaling are 
introduced at points in an embodiment which reduces implementation costs. Processing 
unique to each source is preferably provided for the audio signal prior to amplification and 
filtering. Processing unique to each output signal is preferably provided for the output 
signal after filtering, amplification and combining. 

Throughout this discussion, reference is made to listener position and/or orientation. 
Orientation refers to the orientation of the head relative to the audio source location. 
Position, as distinguished from orientation, refers to the relative location of the source and 
the center of the head. Listener position and/or orientation may be obtained using a wide 
variety of techniques including mechanical, optical, infrared, ultrasound, magnetic and 
radio-frequency techniques, and no particular way is critical to the practice of the present 
invention. 

Listener position and/or orientation may be sensed using headtracking systems such 
as the Bird magnetic sensor manufactured by Ascension Technology Corporation, 
Burlington, Vermont, or the six-degree-of-freedom ISOTRAK II™, InsideTRAK™ and 
FASTRAK™ sensors manufactured by Polhemus Corporation, Colchester, Vermont. 

The position and orientation of a listener riding in a vehicle may also be sensed by 
using mechanical, magnetic or optical switches to sense vehicle location and orientation. 
This technique is useful for amusement or theme park rides in which listeners are 
transported along a track in capsules or other vehicles. 

The position and orientation of a listener may be sensed from static information 
incorporated into the acoustic display. For example, position and orientation of listeners 
seated in a motion picture theater or seated around a conference table may be presumed 
from information describing the theater or table geometry. 

Amplifier gain and/or time delays may be adapted to synthesize ambient effects in 
response to signals describing the simulated environment. Longer delays may be used to 
simulate the reverberance of larger rooms or concert halls, or to simulate echoes from 
distant structures. Highly reflective acoustic environments may be simulated by 
incorporating a large number of reflections with increased gain for late reflections. The 
perception of distance from the audio source can be strengthened by controlling the relative 
gain for reflected soundwaves and direct path soundwaves. In particular, the delay and 
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direction of arrival of reflected soundwaves may be synthesized using information 
describing the geometry and acoustical properties of reflective surfaces, and position and/or 
orientation of a listener within the environment. 

Amplifier gain and/or time delays may also be adapted to adjust HRTF responses to 
individual listener localization characteristics. ITD may be adjusted to account for 
variations in head size and shape. Amplifier gain may be adapted to adjust spectral 
shaping to account for size and shape of head and ear pinnae. In one embodiment of an 
acoustic display, a listener cycles through different coefficient matrices W while listening 
to the spatial effects and selects the matrix which provides the most desirable spatialization. 
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CLAIMS 

1. A method for providing an acoustic display of aural information conveying apparent 
location, said method comprising the steps of: 

receiving one or more audio signals and one or more location signals 
representing one or more sources of aural information, wherein said location signals 
represent apparent locations for said sources, 

generating one or more first signals by applying a first network to said one or 
more audio signals, 

generating a plurality of filtered signals by applying a plurality of filters to said 
one or more first signals, and 

generating one or more output signals by applying a second network to said 
filtered signals, 
wherein 

a) said first signals are generated by applying a plurality of first amplifiers to 
said one or more audio signals, each of said first amplifiers having a respective gain 
adapted in response to said one or more location signals, 

and/or 

b) said second signals are generated by applying a plurality of second amplifiers 
to said filtered signals, each of said second amplifiers having a respective gain adapted 
in response to said one or more location signals. 

2. A method according to claim 1 wherein said first network comprises two or more 
groups of first amplifiers and a respective first signal is generated from the output of a 
respective group of first amplifiers, a respective first amplifier in each group amplifying one 
of said audio signals. 

3. A method according to claim 1 or 2 wherein said second network combines two or 
more of said filtered signals to generate a respective output signal. 

4. A method according to any one of claims 1 through 3 which further comprises 
delaying at least one of said plurality of first signals, the amount of delay adapted in response 
to said one or more location signals and/or a signal representing aural localization 
characteristics of a listener. 
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5. A method according to claim 1 wherein said second network comprises two or more 
groups of second amplifiers and a respective output signal is generated by combining the 
output of a respective group of second amplifiers, a respective second amplifier in each group 
amplifying the filtered signal generated by a respective one of said plurality of filters. 

5 

6. A method according to claim 1 or 5 wherein said first network conveys each of said 
audio signals substantially without change as one or more of said first signals. 

7. A method according to any one of claims 1 through 6 which further comprises 
10 delaying at least one of said output signals, the amount of delay adapted in response to said 

one or more location signals and/or a signal representing aural localization characteristics of 
a listener. 

8. A method according to any one of claims 1 through 7 wherein one or more of said 
15 respective gains are adapted in response to a signal representing aural localization 

characteristics of a listener. 

9. A method according to any one of claims 1 through 8 wherein, in response to a 
configuration signal, said first network and/or said second network arc adapted to configure 

20 said plurality of filters into one or more sets of filters, thereby providing for a variable number 
of audio signals and/or a variable number of output signals. 

10. A method for providing an acoustic display of aural information conveying apparent 
location, said method comprising the steps of: 

25 receiving an audio signal representing said aural information and receiving a 

location signal representing an apparent location for a source of said aural information, 
generating a first filtered signal by applying a first filter to said audio signal, 
said first filter having variable frequency response characteristics adapted in response 
to said location signal, 

30 generating one or more second filtered signals by applying a respective network 

to said audio signal, each network comprising one or more second filters and one or 
more amplifiers, each second filter having a respective unvarying frequency response 
characteristic and each amplifier having a respective gain adapted in response to said 
location signal, and 
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generating an output signal by combining said first filtered signal and said one 
or more second filtered signals. 

11. A method according to claim 10 wherein a respective one of said second filtered 
5 signals is generated by applying a respective second filter to said audio signal and by 

amplifying the output of said respective second filter using a respective one of said one or 
more amplifiers. 

12. A method according to claim 10 or 11 which further comprises delaying at least 
10 one of said second filtered signals, the amount of delay adapted in response to said location 

signal and/or a signal representing aural localization characteristics of a listener. 

13. A method according to any one of claims 10 through 12 wherein said variable 
frequency response characteristics and/or said respective gains are adapted in response to a 

15 signal representing aural localization characteristics of a listener. 

14. A method according to any one of claims 1 through 13 wherein one or more of said 
respective gains are adapted in response to a position signal indicating the position of a 
listener. 

20 

15. A method according to any one of claims 1 through 14 wherein one or more of said 
respective gains are adapted in response to a signal representing ambient characteristics. 

16. A method according to any one of claims 1 through 15 wherein said plurality of 
25 filters have impulse responses which are substantially mutually orthogonal. 

17. A method according to claim 16 wherein said impulse responses are derived by 
singular value decomposition of a given set of impulse responses. 

30 18. An apparatus for providing an acoustic display of aural information conveying 

apparent location, said apparatus comprising: 

means for receiving one or more audio signals and one or more location signals 
representing one or more sources of aural information, wherein said location signals 
represent apparent locations for said sources, 
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means for generating one or more first signals by applying a first network of 
a plurality of first amplifiers to said one or more audio signals, each of said first 
amplifiers having a respective gain adapted in response to said one or more location 
signals, 

5 means for generating a plurality of filtered signals by applying a plurality of 

filters to said one or more first signals, and 

means for generating one or more output signals by applying a second network 
to said filtered signals. 

10 19. An apparatus according to claim 18 wherein said first network comprises two or 

more groups of first amplifiers and a respective first signal is generated from the output of a 
respective group of first amplifiers, a respective first amplifier in each group amplifying one 
of said audio signals. 

15 20. An apparatus according to claim 18 or 19 wherein said second network combines 

two or more of said filtered signals to generate a respective output signal. 

21. An apparatus according to any one of claims 18 through 20 which further 
comprises means for delaying at least one of said plurality of first signals, the amount of delay 

20 adapted in response to said one or more location signals and/or a signal representing aural 
localization characteristics of a listener. 

22. An apparatus for providing an acoustic display of aural information conveying 
apparent location, said apparatus comprising: 

25 means for receiving one or more audio signals and one or more location signals 

representing one or more sources of aural information, wherein said location signals 
represent apparent locations for said sources, 

means for generating one or more first signals by applying a first network to 
said one or more audio signals, 
30 means for generating a plurality of filtered signals by applying a plurality of 

filters to said one or more first signals, and 

means for generating one or more output signals by applying a second network 
of a plurality of second amplifiers to said filtered signals, each of said second 
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amplifiers having a respective gain adapted in response to said one or more location 
signals. 

23. An apparatus according to claim 22 wherein said second network comprises two 
5 or more groups of second amplifiers and a respective output signal is generated by combining 

the output of a respective group of second amplifiers, a respective second amplifier in each 
group amplifying the filtered signal generated by a respective one of said plurality of filters. 

24. An apparatus according to claim 22 or 23 wherein said first network conveys each 
10 of said audio signals substantially without change as one or more of said first signals. 

25. An apparatus according to any one of claims 22 through 24 which further 
comprises means for delaying at least one of said output signals, the amount of delay adapted 
in response to said one or more location signals and/or a signal representing aural localization 

15 characteristics of a listener. 

26. An apparatus according to any one of claims 18 through 25 wherein one or more 
of said respective gains are adapted in response to a signal representing aural localization 
characteristics of a listener. 

20 

27. An apparatus according to any one of claims 18 through 26 further comprising 
means for adapting, in response to a configuration signal, said first network and/or said second 
network to configure said plurality of filters into one or more sets of filters, thereby providing 
for a variable number of audio signals and/or a variable number of output signals. 

25 

28. An apparatus for providing an acoustic display of aural information conveying 
apparent location, said apparatus comprising: 

means for receiving an audio signal representing said aural information and 
receiving a location signal representing an apparent location for a source of said aural 
30 information, 

means for generating a first filtered signal by applying a first filter to said audio 
signal, said first filter having variable frequency response characteristics adapted in 
response to said location signal, 

means for generating one or more second filtered signals by applying a 
35 respective network to said audio signal, each network comprising one or more second 
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filters and one or more amplifiers, each second filter having a respective unvarying 
frequency response characteristic and each amplifier having a respective gain adapted 
in response to said location signal, and 

means for generating an output signal by combining said first filtered signal and 
5 said one or more second filtered signals. 

29. An apparatus according to claim 28 wherein a respective one of said second filtered 
signals is generated by applying a respective second filter to said audio signal and by 
amplifying the output of said respective second filter using a respective one of said one or 

10 more amplifiers. 

30. An apparatus according to claim 28 or 29 which further comprises means for 
delaying at least one of said second filtered signals, the amount of delay adapted in response 
to said location signal and/or a signal representing aural localization characteristics of a 

15 listener. 

31. An apparatus according to any one of claims 28 through 30 wherein said variable 
frequency response characteristics and/or said respective gains are adapted in response to a 
signal representing aural localization characteristics of a listener. 

20 

32. An apparatus according to any one of claims 18 through 31 wherein one or more 
of said respective gains are adapted in response to a position signal indicating the position of 
a listener. 

25 33. An apparatus according to any one of claims 18 through 32 wherein one or more 

of said respective gains are adapted in response to a signal representing ambient 
characteristics. 

34. An apparatus according to any one of claims 18 through 33 wherein said plurality 
30 of filters have impulse responses which are substantially mutually orthogonal. 

35. An apparatus according to claim 34 wherein said impulse responses are derived by 
singular value decomposition of a given set of impulse responses. 



35 
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